ChronoSAGE: Diversifying Topic Modeling Chronologically

ثبت نشده
چکیده

In this paper, we propose a new chronological modeling of topics latent in documents. We apply sparse additive generative models (SAGE) [5] in a manner so that we diversify topic modeling results chronologically by using document timestamps. We call our approach ChronoSAGE. SAGE can represent each word probability by exponential of the sum of multiple parameters representing various facets of documents. Therefore, we prepare three types of parameter to utilize document timestamps: the parameters for each topic, those for each timestamp, and those for each pair of topic and timestamp. Consequently, word tokens are generated not only in a topic-specific manner, but also in a time-specific manner. We first compare ChronoSAGE and vanilla SAGE with LDA in terms of pointwise mutual information (PMI) [10] to show the practical effectiveness of SAGE-type approaches. We then give examples of time-differentiated latent topics obtained by ChronoSAGE to show the usefulness of our chronological topic modeling. As another contribution, we also provide an approximated inference that makes the implementation far easier.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TIA-INAOE Participation at ImageCLEF 2008

This paper describes the participation of the INAOE’s research group on machine learning for image processing and information retrieval from México. This year we proposed two approaches for the photographic retrieval task. First, we studied the annotation-based expansion of documents for image retrieval. This approach consists of automatically assigning labels to images by using supervised mach...

متن کامل

An Optimization Method for Proportionally Diversifying Search Results

The problem of diversifying search results has attracted much attention, since diverse results can provide non-redundant information and cover multiple query-related topics. However, existing approaches typically assign equal importance to each topic. In this paper, we propose a novel method for diversification: proportionally diversifying search results. Specifically, we study the problem of r...

متن کامل

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

Topic Modeling and Classification of Cyberspace Papers Using Text Mining

The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...

متن کامل

Why do firms undertake diversifying mergers? An analysis of the investment. . .

Recent empirical literature documents value-destroying \crosssubsidization" among the divisions of diversi ed rms. However, this literature relies upon two maintained hypotheses: that divisions of diversi ed rms are randomly allocated to their corporate parents and that the investment opportunities facing conglomerate divisions are identical to those of stand-alone rms in their industries. This...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014